In this exercise, we will be using functions from the tidyverse package. You can see we’ve added the chunk option message = FALSE to hide the version information that tidyverse normally displays.

library(tidyverse)

(a) Summarise the Datasaurus

Load Datasaurus.csv. This file contains a few different datasets, which are indicated by the column dataset, each of which contain points with x and y coordinates.

Use the group_by() and summarise() functions to calculate the means and standard deviations of the x and y variables, grouped by the dataset column. You could also include the correlation between x and y in your summary, using the function cor(x, y).

Then make a scatter plot of x vs y, faceted by the dataset column.

datasaurus <- read_csv("Datasaurus.csv")
datasaurus %>%
  group_by(dataset) %>%
  summarise(across(c(x, y), 
                   list(mean = ~mean(.), 
                        sd = ~sd(.))),
            corr = cor(x, y)) %>%
  ungroup()
# A tibble: 13 × 6
   dataset    x_mean  x_sd y_mean  y_sd    corr
   <chr>       <dbl> <dbl>  <dbl> <dbl>   <dbl>
 1 away         54.3  16.8   47.8  26.9 -0.0641
 2 bullseye     54.3  16.8   47.8  26.9 -0.0686
 3 circle       54.3  16.8   47.8  26.9 -0.0683
 4 dino         54.3  16.8   47.8  26.9 -0.0645
 5 dots         54.3  16.8   47.8  26.9 -0.0603
 6 h_lines      54.3  16.8   47.8  26.9 -0.0617
 7 high_lines   54.3  16.8   47.8  26.9 -0.0685
 8 slant_down   54.3  16.8   47.8  26.9 -0.0690
 9 slant_up     54.3  16.8   47.8  26.9 -0.0686
10 star         54.3  16.8   47.8  26.9 -0.0630
11 v_lines      54.3  16.8   47.8  26.9 -0.0694
12 wide_lines   54.3  16.8   47.8  26.9 -0.0666
13 x_shape      54.3  16.8   47.8  26.9 -0.0656
ggplot(datasaurus, aes(x = x, y = y)) +
  geom_point() +
  facet_wrap(vars(dataset), ncol = 5)

(b) Convert the pig behaviour data to long form

We’ve seen the data in pig_behaviour_by_time.csv in the lectures.

Use pivot_longer() to convert the seven pig behaviours (in the columns Upright through Nosing_pen) into a factor Behaviour and measurement in variable Number (representing the number of pigs engaging in a particular behaviour).

Use mutate() to derive a variable Proportion, representing the proportion of pigs engaging in a particular behaviour, by dividing by the Total_pigs variable.

Store this data frame in a variable, and then display it using glimpse().

pig_behaviour_by_time <- read_csv("pig_behaviour_by_time.csv")
pig_behaviour_longer <- pig_behaviour_by_time %>%
  pivot_longer(Upright:Nosing_pen,
               names_to = "Behaviour",
               values_to = "Number") %>%
  mutate(Proportion = Number / Total_pigs)
glimpse(pig_behaviour_longer)
Rows: 1,960
Columns: 12
$ Pen        <dbl> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3,…
$ Housing    <chr> "FC", "FC", "FC", "FC", "FC", "FC", "FC", "FC", "FC", "FC",…
$ Treatment  <chr> "HC", "HC", "HC", "HC", "HC", "HC", "HC", "HC", "HC", "HC",…
$ HousTreat  <chr> "FC, HC", "FC, HC", "FC, HC", "FC, HC", "FC, HC", "FC, HC",…
$ Sex        <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M",…
$ Total_pigs <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
$ Time       <chr> "15 mins", "15 mins", "15 mins", "15 mins", "15 mins", "15 …
$ Time_brief <chr> "15m", "15m", "15m", "15m", "15m", "15m", "15m", "15m", "15…
$ Time_hours <dbl> 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25,…
$ Behaviour  <chr> "Upright", "Aggression", "Chewing_pig", "Playing", "Vocalis…
$ Number     <dbl> 10, 0, 2, 0, 0, 8, 0, 10, 0, 0, 1, 0, 10, 0, 8, 1, 3, 0, 0,…
$ Proportion <dbl> 1.0, 0.0, 0.2, 0.0, 0.0, 0.8, 0.0, 1.0, 0.0, 0.0, 0.1, 0.0,…

(c) Make a summary data frame from the pig behaviour data

Use group_by() and summarise() to calculate the mean proportion of pigs engaging in each behaviour, for each level of Housing and Treatment. (You will need to use na.rm = TRUE, as there is missing data in this dataset.)

Store this data frame in a variable, and then display it using glimpse().

pig_behaviour_means <- pig_behaviour_longer %>%
  group_by(Behaviour, Housing, Treatment) %>%
  summarise(Proportion = mean(Proportion, na.rm = TRUE)) %>%
  ungroup()
`summarise()` has grouped output by 'Behaviour', 'Housing'. You can override
using the `.groups` argument.
glimpse(pig_behaviour_means)
Rows: 28
Columns: 4
$ Behaviour  <chr> "Aggression", "Aggression", "Aggression", "Aggression", "Ch…
$ Housing    <chr> "FC", "FC", "PS", "PS", "FC", "FC", "PS", "PS", "FC", "FC",…
$ Treatment  <chr> "C", "HC", "C", "HC", "C", "HC", "C", "HC", "C", "HC", "C",…
$ Proportion <dbl> 0.02500000, 0.01529412, 0.01428571, 0.03396226, 0.20833333,…

(d) Convert the summary data frame into wide form

Use pivot_wider() on the summary data frame to make one column for each treatment combination (i.e., each combination of Housing and Treatment).

Hint: pivot_wider() allows you to provide more than one variable for names_from; e.g. names_from = c(column1, column2).

pig_behaviour_means %>%
  pivot_wider(names_from = c(Housing, Treatment),
              values_from = Proportion)
# A tibble: 7 × 5
  Behaviour       FC_C  FC_HC   PS_C  PS_HC
  <chr>          <dbl>  <dbl>  <dbl>  <dbl>
1 Aggression    0.025  0.0153 0.0143 0.0340
2 Chewing_pig   0.208  0.178  0.23   0.238 
3 Exploring_pen 0.33   0.336  0.436  0.379 
4 Nosing_pen    0.035  0.0471 0.0829 0.0566
5 Playing       0.0217 0.0188 0.0257 0.0113
6 Upright       0.527  0.601  0.724  0.723 
7 Vocalising    0.115  0.107  0.141  0.166 

© 2021 Statistical Consulting Centre, The University of Melbourne.